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Abstract 

For every fixed constant a > 0, we design an algorithm for computing the fc-sparse Walsh- 
Hadamard transform of an iV-dimensional vector x € in time A:^+“(log-/V)'^e), Specifically, 
the algorithm is given query access to x and computes a fc-sparse x € K.^ satisfying ||i — i;||i < 
c\\x — Hk{x)\\i, for an absolute constant c > 0, where x is the transform of x and Hk{x) is its best 
fc-sparse approximation. Our algorithm is fully deterministic and only uses non-adaptive queries 
to X (i.e., all queries are determined and performed in parallel when the algorithm starts). 

An important technical tool that we use is a construction of nearly optimal and linear lossless 
condensers which is a careful instantiation of the GUV condenser (Guruswami, Umans, Vadhan, 
JACM 2009). Moreover, we design a deterministic and non-adaptive compressed sensing 
scheme based on general lossless condensers that is equipped with a fast reconstruction algorithm 
running in time fc^+“(log (for the GUV-based condenser) and is of independent interest. 

Our scheme significantly simplifies and improves an earlier expander-based construction due to 
Berinde, Gilbert, Indyk, Karloff, Strauss (Allerton 2008). 

Our methods use linear lossless condensers in a black box fashion; therefore, any future 
improvement on explicit constructions of such condensers would immediately translate to im¬ 
proved parameters in our framework (potentially leading to fc(log reconstruction time 

with a reduced exponent in the poly-logarithmic factor, and eliminating the extra parameter 

a). 

Finally, by allowing the algorithm to use randomness, while still using non-adaptive queries, 
the running time of the algorithm can be improved to (5(fclog^ TV). 
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1 Introduction 


The Discrete Walsh-Hadamard transform (henceforth the Hadamard Transform or DHT) of a vector 
X G where N = 2”, is a vector x G defined as follows; 




( 1 ) 


where the coordinate positions are indexed by the elements of F 2 , x{i) denoting the entry at 
position z G F 2 and the inner product (z,j) is over F 2 . Equivalently, the Hadamard transform is 
a variation of the Discrete Fourier transform (DFT) defined over the hypercube F^. We use the 
notation x = DHT(x). 

The standard divide and conquer approach of Fast Fourier Transform (FFT) can be applied 
to the Hadamard transform as well to compute DHT in time 0{N \ogN). In many applications, 
however, most of the Fourier coefficients of a signal are small or equal to zero, i.e., the output of 
the DFT is (approximately) sparse. In such scenarios one can hope to design an algorithm with a 
running time that is sub-linear in the signal length N. Such algorithms would significantly improve 
the performance of systems that rely on processing of sparse signals. 

The goal of designing efficient DFT and DHT algorithms for (approximately) sparse signals has 
been a subject of a large body of research, starting with the celebrated Goldreich-Levin theorem 
[7] in complexity theory^. The last decade has witnessed the development of several highly efficient 
sub-linear time sparse Fourier transform algorithms. These recent algorithms have mostly focused 
on the Discrete Fourier transform (DFT) over the cyclic group (and techniques that only apply 
to this group), whereas some (for example, [16]) have focused on the Hadamard transform. In terms 
of the running time, the best bounds to date were obtained in [9] which showed that a /c-sparse 
approximation of the DFT transform can be computed in time 0{k{log or even in 0{k log N) 
time if the spectrum of the signal has at most k non-zero coefficients. These developments as well 
as some of their applications have been summarized in two surveys [6] and [5]. 

While most of the aforementioned algorithms are randomized, from both theoretical and prac¬ 
tical viewpoints it is desirable to design deterministic algorithms for the problem. Although such 
algorithms have been a subject of several works, including [1, 13, 12], there is a considerable ef¬ 
ficiency gap between the deterministic sparse Fourier Transform algorithms and the randomized 
ones. Specifically, the best known deterministic algorithm, given in [12], finds a A:-sparse approxi¬ 
mation of the DFT transform of a signal in time 0(A;^(log A^)^*-^)); i.e., its running time is quadratic 
in the signal sparsity. Designing a deterministic algorithm with reduced run time dependence on 
the signal sparsity has been recognized as a challenging open problem in the area (e.g., see Question 
2 in [11]). 


^ This result is also known in the coding theory community as a list decoding algorithm for the Hadamard code, 
and crucially used in computational learning as a part of the Kushilevitz-Mansour Algorithm for learning low-degree 
Boolean functions [15]. 


3 



1.1 Our result 


In this paper we make a considerable progress on this question, by designing a deterministic al¬ 
gorithm for DHT that runs in time 0(A:^+“(log Since our main interest is optimizing the 

exponent of k in the running time of the DHT algorithm, the reader may think of a parameter 
regime where the sparsity parameter k is not too insignificant compared to the dimension N (e.g., 
we would like to have k > (logAi)‘^(^\ say k ~ so that reducing the exponent of k at cost 

of incurring additional poly-logarithmic factors in N would be feasible^. 

To describe the result formally, we will consider a formulation of the problem when the algorithm 
is given a query access to x and the goal is to approximate the largest k terms of x using a 
deterministic sub-linear time algorithm^. More precisely, given an integer parameter k and query 
access to x, we wish to compute a vector x £ such that for some absolute constant c > 0, 

||S — a^lli < c • ||ilfc(x) — x||i, (2) 

where we use Hk{x) to denote the approximation of x to the k largest magnitude coordinates; i.e., 
Hk{x) G is only supported on the k largest (in absolute value) coefficients of x and is equal 
to X in those positions. Note that if the input signal x has at most k non-zero coefficients, then 
Hk{x) = X and therefore the recovery is exact, i.e., x = x. The goal formulated in (2) is the so- 
called ii/£i recovery in the sparse recovery literature. In general, one may think of £p/£q recovery 
where the norm on the left hand side (resp., right hand side) of 2 is ip (resp., iq), such as l 2 li\ or 
^ 2 /^ 2 - However, in this work we only address the ixjix model as formulated in (2) (for a survey of 
different objectives and a comparison between them, see [4]). 

The following statement formally captures our main result. 

Theorem 1. For every fixed constant a > 0, there is a deterministic algorithm as follows. Let 
N = 2” and k < N be positive integers. Then, given (non-adaptive) query access to any x G R'^ 
where each coefficient of x is bits long, the algorithm runs in time and outputs 

X G R^ that satisfies (2) (where x = DHT(x)j for some absolute constant c > 0. 

Remark 2. The parameter a in the above result is arbitrary as long as it is an absolute positive 
constant, for example one may fix a = .1 throughout the paper. We remark that this parameter 
appears not because of our general techniques but solely as an artifact of a particular state-of-the- 
art family of unbalanced expander graphs (due to Guruswami, Umans, and Vadhan [8]) that we 
use as a part of the algorithm (as further explained below in the techniques section). Since we use 
such expander graphs as a black box, any future progress on construction of unbalanced expander 
graphs would immediately improve the running time achieved by Theorem 1, potentially leading to 

^For this reason, and in favor of the clarity and modularity of presentation, for the most part we do not attempt 
to optimize the exact constant in the exponent of the (logN)*^^^^ factor. 

^ Since the Hadamard transform is its own inverse, we can interchange the roles of x and x, so the same algorithm 
can be used to approximate the largest k terms of x given query access to x. 
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a nearly optimal time of with linear dependence on the sparsity parameter k which would 

be the best to hope for. 

In the running time reported by Theorem 1, the 0(1) in the exponent of n hides a 

factor depending on 1/a; i.e., the running time can be more precisely be written as 
However, since a is taken to be an absolute constant, this in turn asymptotically simplifies to 

ki+a 

Since our main focus in this work is optimizing the exponent of k (and regard the 
sparsity k to not be too small compared to N, say k ~ we have not attempted to optimize 

the exponent of log in the running time. However, as we will see in Section 5.1, if one is willing 
to use randomness in the algorithm, the running time can be significantly improved (eliminating 
the need for the parameter a) using a currently existing family of explicit expander graphs (based 
on the Left-over Hash Lemma). □ 

As discussed in Remark 2 above, our algorithm employs state of the art constructions of explicit 
lossless expander graphs that to this date remain sub-optimal, resulting in a rather large exponent 
in the logA^ factor of the asymptotic running time estimate. Even though the main focus of this 
article is fully deterministic algorithms for fast recovery of the Discrete Hadamard Transform, 
we further observe that the same algorithm that we develop can be adapted to run substantially 
faster using randomness and sub-optimal lossless expander graphs such as the family of expanders 
obtained from the Leftover Hash Lemma. As a result, we obtain the following improvement over 
the deterministic version of our algorithm. 

Theorem 3. There is a randomized algorithm that, given integers k,n (where k <n), and (non- 
adaptive) query access to any x G (where N := 2” and each coefficient of x is 0{n) bits long), 

outputs X G that, with probability at least 1 — o(l) over the internal random coin tosses of the 

algorithm, satisfies (2) for some absolute constant c > 0 and x = DHT(x). Moreover, the algorithm 
performs a worse-case 0(A:n^(log A:)(logn)) = 0(A:(log A^)^) arithmetic operations. 

1.2 Techniques 

Most of the recent sparse Fourier transform algorithms (both randomized and deterministic) are 
based on a form of “binning”. At a high level, sparse Fourier algorithms work by mapping (binning) 
the coefficients into a small number of bins. Since the signal is sparse, each bin is likely to have 
only one large coefficient, which can then be located (to find its position) and estimated (to find its 
value). The key requirement is that the binning process needs to be performed using few samples of 
X, to minimize the running time. Furthermore, since the estimation step typically introduces some 
error, the process is repeated several times, either in parallel (where the results of independent 
trials are aggregated at the end) or iteratively (where the identified coefficients are eliminated 
before proceeding to the next step). 

As described above, the best previous deterministic algorithm for the sparse Fourier Transform 
(over the cyclic group Zjv), given in [12], runs in time k^ ■ (logA^)^*'^^ The algorithm satisfies the 
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guarantee^ in (2). The algorithm follows the aforementioned approach, where binning is imple¬ 
menting by aliasing] i.e., by computing a signal y such that yj = i^o^p=jXi, where p denotes 
the number of bins. To ensure that the coefficients are isolated by the mapping, this process is 
repeated in parallel for several values of p = pi,p 2 , ■ • • Pt- Each pi is greater than k to ensure that 
there are more bins than elements. Furthermore, the number of different aliasing patterns t must 
be greater than k as well, as otherwise a fixed coefficient could always collide with one of the other 
k coefficients. As a result, this approach requires more than k"^ bins, which results in quadratic 
running time. One can reduce the number of bins by resorting to randomization: The algorithm 
can select only some of the piS uniformly at random and still ensure that a hxed coefficient does not 
collide with any other coefficient with constant probability. In the deterministic case, however, it 
is easy to see that one needs to use 0(A:) mappings to isolate each coefficient, and thus the analysis 
of the algorithm in [12] is essentially tight. 

In order to reduce the running time, we need to reduce the total number of mappings. To 
this end we relax the requirements imposed on the mappings. Specifically, we will require that 
the union of all coefficients-to-bins mappings forms a good expander graph (see section 2 for the 
formal dehnition). Expansion is a natural property to require in this context, as it is known that 
there exist expanders that are induced by only (logA^)*^^^^ mappings but that nevertheless lead 
to near-optimal sparse recovery schemes [2]. The difficulty, however, is that for our purpose we 
need to simulate those mappings on coefficients of the signal x, even though we can only access the 
spectrum x of x. Thus, unlike in [2], in our case we cannot use arbitrary “black box” expanders 
induced by arbitrary mappings. Fortunately, there is a class of mappings that are easy to implement 
in our context, namely the class of linear mappings. 

In this paper, we hrst show that an observation by one of the authors (as reported in [3]) implies 
that there exist explicit expanders that are induced by a small number of linear mappings. From 
this we conclude that there exists an algorithm that makes only A;^'’'"(log queries to x and 

finds a solution satisfying (2). However, the expander construction alone does not yield an efficient 
algorithm. To obtain such an algorithm, we augment the expander construction with an extra set 
of queries that enables us to quickly identify the large coefficients of x. The recovery procedure 
that uses those queries is iterative, and the general approach is similar to the algorithm given in 
Appendix A of [2]. However, our procedure and the analysis are considerably simpler (thanks to 
the fact that we only use the so-called Restricted Isometry Property (RIP) for the ii norm instead 
of ip for p > 1). Moreover, our particular construction is immediately extendable for use in the 
Hadamard transform problem (due to the linearity properties). 

The rest of the article is organized as follows. Section 2 discusses notation and the straightfor¬ 
ward observation that the sparse DHT problem reduces to compressed sensing with query access 

^Technically, the guarantee proven in [12] is somewhat different, namely it shows that jji — a ;||2 < \\Hk{x) — x \\2 + 
-f- ■ \\Hk{x) — a;||i. However, the guarantee of (2) can be shown as well [Mark Iwen, personal communication]. In 
general, the guarantee of (2) is easier to show than the guarantee in [12]. 
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to the Discrete Hadamard Transform of the underlying sparse signal. Also the notion of Restricted 
Isometry Property, lossless condensers, and unbalanced expander graphs are introduced in this sec¬ 
tion. Section 3 focuses on the sample complexity; i.e., the amount of (non-adaptive) queries that 
the compressed sensing algorithm (obtained by the above reduction) makes in order to reconstruct 
the underlying sparse signal. Section 4 adds to the results of the preceding section and describes 
our main (deterministic and sublinear time) algorithm to efficiently reconstruct the sparse signal 
from the obtained measurements. Finally Section 5 observes that the performance of the algorithm 
can be improved when allowed to use randomness. Although the main focus of this article is on 
deterministic algorithms, the improvement using randomness comes as an added bonus that we 
believe is worthwhile to mention. 

2 Preliminaries 

Notation. Let N := 2"" and x € We index the entries of x by elements of F 2 and refer to 
x{i), for i G F 2 , as the entry of x at the ith coordinate. The notation supp(x) is used for support 
of x; i.e., the set of nonzero coordinate positions of x. A vector x is called L-sparse if |supp(a:)| < k. 
For a set S C [N] we denote by xs the N-dimensional vector that agrees with x on coordinates 
picked by S and is zeros elsewhere. We thus have x^ = x — xs- All logarithms in this work are to 
the base 2. 

Equivalent formulation by interchanging the roles of x and x 

Recall that in the original sparse Hadamard transform problem, the algorithm is given query access 
to a vector x G and the goal is to compute a /c-sparse x that approximates x = DHT(x). That 
is, 

\\x — x\\i < c - \\x — Hk{x)\\i 

for an absolute constant c > 0. However, since the Hadamard transform is its own inverse; i.e., 
DHT(x) = x, we can interchange the roles of x and x. That is, the original sparse Hadamard 
transform problem is equivalent to the problem of having query access to the Hadamard transform 
of x (i.e., x) and computing a fc-sparse approximation of x satisfying (2). Henceforth throughout 
the paper, we consider this equivalent formulation which is more convenient for establishing the 
connection with sparse recovery problems. 

Approximation guarantees and the Restricted Isometry property: We note that the 
equation in (2) is similar to the ii/ii recovery studied in compressed sensing. In fact the sparse 
Hadamard transform problem as formulated above is the same as ^ 1/^1 when the measurements are 
restricted to the set of linear forms extracting Hadamard coefficients. Thus our goal in this work 
is to present a non-adaptive sub-linear time algorithm that achieves the above requirements for 
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all vectors x and in a deterministic and efficient fashion. It is known that the so-called Restricted 
Isometry Property for the ii norm (RIP-I) characterizes the combinatorial property needed to 
achieve (2). Namely, we say that an m x N matrix M satisfies RIP-1 of order k with constant 5 if 
for every fc-sparse vector x G 

(1 -(5)||x||i < IlMxlli < (1-hh)||a:||i. (3) 

More generally, it is possible to consider RIP-p for the £p norm, where the norm used in the above 
guarantee is ip. As shown in [2], for any such matrix M, it is possible to obtain an approximation 
X satisfying (2) from the knowledge of Mx. In fact, such a reconstruction can be algorithmically 
achieved using convex optimization methods and in polynomial time in N. 

Expanders and condensers. It is well known that RIP-1 matrices with zero-one entries (before 
normalization) are equivalent to adjacency matrices of unbalanced expander graphs, which are 
formally defined below. 

Definition 4. A D-regular bipartite graph G = {A,B,E) with A, B, E respectively defining the 
set of left vertices, right vertices, and edges, is said to be a {k, e)-unbalanced expander graph if 
for every set 5 C A such that |5| < k, we have |r(S')| > (1 — e)D|5|, where r(S') denotes the 
neighborhood of S. 

One direction of the above-mentioned characterization of binary RIP-1 matrices which is im¬ 
portant for the present work is the following (which we will use only for the special case p = 1). 

Theorem 5. ([2, Theorem 1]) Consider any m x N matrix ‘h that is the adjacency matrix of a 
{k,e)-unbalanced expander graph G = {A,B,E), |A| = N, \B\ = m, with left degree D, such that 
1/e, D are smaller than N. Then, the scaled matrix satisfies the RIP-p of order k with 

constant 6, for any 1 < p < 1 -|- 1/log and 5 = Cqc for some absolute constant Cq > 1. 

Unbalanced expander graphs can be obtained from the truth tables of lossless condensers, a class 
of pseudorandom functions defined below. We first recall that the min-entropy of a distribution X 
with finite support 0 is given by Hoo{X) := mina,gQ{—log A’(x)}, where X{x) is the probability 
that X assigns to the outcome x. The statistical distance between two distributions X and y 
defined on the same finite space it is given by | |A’(s) — T('S)1, which is half the ii distance 

of the two distributions when regarded as vectors of probabilities over il. Two distributions X and 
y are said to be e-close if their statistical distance is at most e. 

Definition 6. A function h: Fg x [D] ^ F 2 is a { k , e)-lossless condenser if for every set S' C Fg of 
size at most 2^^, the following holds: Let X £ F 2 be a random variable uniformly sampled from S 
and Z £ [D] be uniformly random and independent of X. Then, the distribution of {Z, h{X, Z)) is 
e-close in statistical distance to some distribution with min-entropy at least log(D|S'|). A condenser 
is explicit if it is computable in polynomial time in n. 


Ideally, the hope is to attain r = k + log(l/e) + 0(1) and D = 0{n/e). This is in fact achieved 
by a random fnnction with high probability [8]. Equivalence of bipartite unbalanced expanders and 
lossless condensers was shown in [18]. Namely, we have the following. 

Definition 7. Consider a function h: F 2 x [D] —>• F^. The (bipartite) graph associated with h is 
a bipartite graph G = (F 2 ,F 2 x [D],E) with the edge set E defined as follows. For every a £ F 2 
and ih,t) £ F 2 x [D], there is an edge in E between a and (6,t) iff h{a,t) = b. For any choice 
of t £ [D], we define the function /ij: F 2 —)■ F 2 by ht{x) := h{x,t). Then, the graph associated 
with ht is defined as the subgraph of G induced by the restriction of the right vertices to the set 
{{b,t): 6 £ F 2 }. We say that h is linear in the first argument if ht is linear over F 2 for every fixed 
choice of t. 

Lemma 8. ([18]) A function h: F 2 x [D] W 2 is a {K,e)-lossless condenser if and only if the 
bipartite graph associated to h is a {2*^ ,e)-unbalanced expander. 

3 Obtaining nearly optimal sample complexity 

Before focusing on the algorithmic aspect of sparse Hadamard transform, we demonstrate that 
deterministic sparse Hadamard transform is possible in information-theoretic sense. That is, as a 
warm-up we first focus on a sample-efficient algorithm without worrying about the running time. 
The key tool that we use is the following observation whose proof is discussed in Section 3.1. 

Lemma 9. Let h: Elf x [D] —)• F 2 , where r < n, be a function computable in time and 

linear in the first argument. Let M £ {0,1}^^’^^^"" be the adjacency matrix of the bipartite graph 
associated with h (as in Definition 7). Then, for any x £ the product Mx can be computed 
using only query access to x = DHT(x) from D2^ deterministic queries to x and in time D2'"n^^^\ 

It is known that RIP-1 matrices suffice for sparse recovery in the ii/ii model of (2). Namely. 

Theorem 10. [2] Let ^ be a real matrix with N columns satisfying RIP-1 of order k with sufficiently 
small constant (5 > 0. Then, for any vector x £ there is an algorithm that given $ and 
computes an estimate x £ satisfying (2) in time . 

By combining this result with Lemma 8, Theorem 5, and Lemma 9, we immediately arrive at 
the following result. 

Theorem 11. There are absolute constants c, e > 0 such that the following holds. Suppose there 
is an explicit linear {log k,€)-lossless condenser h\Wfx [D\ —F 2 and let N := 2"'. Then, there 
is a deterministic algorithm running in time that, given query access to x = DHT(x) £ , 

non-adaptively queries x at D2^ locations and outputs x £ such that 

11^ — 3:||i<c-||3: — Hk{x)\\i. 
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Proof. Let M G be the adjacency matrix of the bipartite graph associated with the 

condenser h. By Lemma 8, M represents a {k, e)-unbalanced expander graph. Thus by Theorem 5, 
M/D satisfies RIP of order k with constant 5 = Cqe. By Theorem 10, assuming e (and thus 6) are 
sufficiently small constants, it suffices to show that the product Mx for a given vector x G can 
be computed efficiently by only querying x non-adaptively at D2'’ locations. This is exactly what 


shown by Lemma 9. 


□ 


One of the best known explicit constructions of lossless condensers is due to Guruswami et 
al. [8] that uses techniques from list-decodable algebraic codes. As observed by Cheraghchi [3], this 
construction can be modified to make the condenser linear. Namely, the above-mentioned result 


proves the following. 

Theorem 12. [3, Corollary 2.23] Let p he a fixed prime power and a > 0 be an arbitrary constant. 
Then, for parameters n G IN, k < nlogp, and e > 0, there is an explicit linear {K,e)-lossless 
condenser /i: F” x [H] —)■ satisfying logD < (1 + l/Q;)(log(nK/e) -|- 0(1) and rlogp < logZ) -|- 
(1 -|- a)K. 

For completeness, we include a proof of Theorem 12 in Appendix A. Combined with Theorem 11, 
we conclude the following. 

Corollary 13. For every a > 0 and integer parameters N = 2”, fc > 0 and parameter e > 0, there 
is a deterministic algorithm running in time that, given query access to x = DHT(x) G 

non-adaptively queries x at 0{k^~^^{nlogk)‘^~^‘^^°‘) = coordinate positions and outputs 

X G such that 




for some absolute constant c > 0. 


3.1 Proof of Lemma 9 


For a vector x G F^ and set V C F^, let x{V) denote the summation 


x{V) := ^x(i). 


Lemma 9 is an immediate consequence of Lemma 15 below, before which we derive a simple 
proposition. 


Proposition 14. Let V C F 2 be a linear space. Then for every a G F 2 , we have 
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Proof. We simply expand the summation according to the Hadamard transform formula (1) as 
follows. 


iGa+V i&V 

jeFj i&v 

where the last equality uses the basic linear-algebraic fact that 

if j G 
ifj 

□ 

Lemma 15. Let V <LF 2 be a linear space and W C F 2 6e a linear space complementing V. That 
is, W is a linear sub-space such that \W\ -11^1=^^ and V + W = F 2 . Then, the vector 

v:= (x(a + F): a G W) G RI^I 

can be computed in time 0(| VF| log(| VF|)n) and by only querying x{i) for all i G V-^, assuming 
that the algorithm is given a basis for W and V-^. 

Proof. We will use a divide and conquer approach similar to the standard Fast Hadamard Transform 
algorithm. Let r := dim(lF) = dim(F-*-) = n — dim(F). Fix a basis vi,... ,Vr of V-^ and a basis 
tci,... jtCr of W. For i G [r], let := spanjui,... ,Vi} and Wi := span{u;i,... ,Wi}. 

Let the matrix Hi G {—1, -|-l}2*x2* gg that the rows and columns are indexed by the elements 
of Wi and Vi"*", respectively, with the entry at row i and column j defined as (—Using this 
notation, by Proposition 14 the problem is equivalent to computing the matrix-vector product HrZ 
for any given 2 ; G R^"^. 

Note that Wr = Wr-i U {wr + ITr-i) and similarly, U {vr + Let G 

{ —^ be a diagonal matrix with rows and columns indexed by the elements of Wr-i and 
the diagonal entry at position w G lTr_i be defined as (—Similarly, let D'^ G { — 
be a diagonal matrix with rows and columns indexed by the elements of and the diagonal entry 
at position v G be defined as (— Let 2 = {zi),zi) where zq G R^"^ ^ (resp., zi G R^'^ 
is the restriction of Z to the entries indexed by (resp., Vr + V^i). Using the above notation, 
we can derive the recurrence 


i&V I 


1^1 

0 


HrZ = {Hr-lZQ -\- DrHr-lZi, Hr-l 


D'rZo + (- 1 ) 
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Therefore, after calling the transformation defined by Hr-i twice as a subroutine, the product HrZ 
can be computed using 0(2*') operations on n-bit vectors. Therefore, the recursive procedure can 
compute the transformation defined by using 0{r2^) operations on n-bit vectors. □ 

Using the above tools, we are now ready to finish the proof of Lemma 9. Consider any t G [D]. 
Let U C F 2 be the kernel of ht and N ;= 2”. Let M* be the 2"^ x N submatrix of M consisting 
of rows corresponding to the fixed choice of t. Our goal is to compute M* • x for all fixings of t. 
Without loss of generality, we can assume that ht is surjective. If not, certain rows of M* would be 
all zeros and the submatrix of obtained by removing such rows would correspond to a surjective 
linear function h[ whose kernel can be computed in time 

When ht is surjective, we have dim V = n — r. Let W C F 2 be the space of coset representatives 
of V (i.e., |IU| = 2^ and V+W = F 2 ). Note that we also have |U“'“| = 2^, and that a basis for W and 
U-*- can be computed in time (in fact, V-^ is generated by the rows of the rxN transformation 
matrix defined by ht, and a generator for W can be computed using Gaussian elimination in time 
0(n3)). 

By standard linear algebra, for each y G F 2 there is an a{y) G W such that h^^{y) = a{y) + V 
and that a{y) can be computed in time Observe that M^x contains a row for each y, at 

which the corresponding inner product is the summation ~ + ^)- Therefore, 

the problem reduces to computing the vector {x{a + V)'. a G W) which, according to Lemma 15, 
can be computed in time 0(r2^) in addition to the time required for computing a basis for 

W and V~^. By going over all choices of t, it follows that Mx can be computed as claimed. This 
concludes the proof of Lemma 9. □ 

4 Obtaining nearly optimal reconstruction time 

The modular nature of the sparse Hadamard transform algorithm presented in Section 3 reduces 
the problem to the general sparse recovery which is of independent interest. As a result, in 

order to make the algorithm run in sublinear time it suffices to design a sparse recovery algorithm 
analogous to the result of Theorem 10 that runs in sublinear time in N. In this section we construct 
such an algorithm, which is independently interesting for sparse recovery applications. 

4.1 Augmentation of the sensing matrix 

A technique that has been used in the literature for fast reconstruction of exactly /c-sparse vectors is 
the idea of augmenting the measurement matrix with additional rows that guide the search process 
(cf. [2]). For our application, one obstacle that is not present in general sparse recovery is that 
the augmented sketch should be computable only with access to Hadamard transform queries. For 
this reason, crucially we cannot use any general sparse recovery algorithm as black box and have 
to specifically design an augmentation that is compatible with the restrictive model of Hadamard 
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transform queries. We thus restrict ourselves to tensor product augmentation with “bit selection” 
matrices dehned as follows, and will later show that such augmentation can be implemented only 
using queries to the Hadamard coefficients. 

Definition 16. The bit selection matrix B G {0, with n rows and N = 2^ columns is a 

matrix with columns indexed by the elements of such that the entry of B at the jth row and 
ith column (where j G [n] and i G F 2 ) is the jth bit of i. 

Definition 17. Let A G {0,1}™ x {0,1}^ and A' G {0,1}™^ x {0,1}'^ be matrices. The tensor 
product A® A' is an mm' x N binary matrix with rows indexed by the elements of [m\ x [m'\ such 
that for i G [m\ and i' G [m'], the rows oi A® A' indexed by is the coordinate-wise product 
of the ith row of A and i'th row of A'. 

We will use tensor products of expander-based sensing matrices with bit selection matrix, and 
extend the result of Lemma 9 to such products. 

Lemma 18. Let h: F 2 x [D] —>• F 2 , where r < n, he a function computable in time and 

linear in the first argument, and define N := 2"'. Let M G {0, i}' 02 ’'xAf adjacency matrix of 

the bipartite graph associated with h (as in Definition 1) and M' := M ® B where B G {0, 
is the bit selection matrix with n rows. Then, for any x G the product M'x can he computed 
using only query access to x from 0{D2''n) deterministic queries to x and in time D2''n^^^\ 

Proof. For each b G [n], define F 2 x [D] — )■ to be hf{x,z) := {h{x, z),x{b)). Note that 

since h is linear over F 2 , so is h!' for all b. Let G {0, adjacency matrix of 

the bipartite graph associated with h^ (as in Definition 7) and M" G {0, \^Dn 2 '^+^xN ^.j^g 

resulting from stacking M",..., M" on top of each other. One can see that the set of rows of M" 
contains the Dn2” rows of M' = M ® B. 

By Lemma 9 (applied on all choices of h^ for 6 G [n]), the product M"x (and hence, M'x) can 
be computed using only query access to x from 0(Dn2'’) deterministic queries to x and in time 
0{Drn2''). This completes the proof. □ 

In order to improve the running time of the algorithm in Theorem 11, we use the following 
result which is our main technical tool and discussed in Section 4.2. 

Theorem 19. There are absolute constants c > 0 and e > 0 such that the following holds. Let 
k,n,L (k < n and log L = ) be positive integer parameters, and suppose there exists a function 

h: F 2 X [D] —>• F 2 (where r <n) which is an explicit (log(4A:), e)-lossless condenser. Let M be the 
adjacency matrix of the bipartite graph associated with h and B be the bit-selection matrix with 
n rows and iV := 2” columns. Then, there is an algorithm that, given k and vectors Mx and 
(M ® B)x for some x G (which is not given to the algorithm and whose entries are bits 

long), computes a k-sparse estimate x satisfying 

11^ — aj||i<c-||3: — L7fc(x)||i. 
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Moreover, the running time of the algorithm is 0{2'^. 

The above result is proved using the algorithm discussed in Section 4.2. By using this result in 
conjunction with Lemma 18 in the proof of Theorem 11, we obtain our main result as follows. 

Theorem 20. (Main) There are absolute constants c > 0 and e > 0 such that the following holds. 
Let k,n (k < n) be positive integer parameters, and suppose there exists a function /i: F 2 x [D] —)• F 2 
(where r <n) whieh is an explicit {log{4:k),e)-lossless condenser and is linear in the first argument. 
Then, there is a deterministie algorithm running in time that, given (non-adaptive) 

query aecess to x £ (where N := 2”, and eaeh entry of x is bits long), outputs x G 

sueh that 

\\x — x\\i < c ■ \\x — Hk{x)\\i. 

Proof. We closely follow the proof of Theorem 11, but in the proof use Theorem 19 instead of 
Theorem 10. 

Since each entry of x is bits long and the Hadamard transform matrix (after normalization) 
only contains ±1 entries, we see that each entry of y/Nx is bits long as well. 

Let M be the adjacency matrix of the bipartite expander graph associated with h, B be the bit 
selection matrix with n rows, and M' := M®B. By the argument of Theorem 11, the product Mx 
can be computed in time 2'^Dn^^^^ only by non-adaptive query access to x. Same is true for the 
product M'x using a similar argument and using Lemma 18. Once computed, this information can 
be passed to the algorithm guaranteed by Theorem 19 to compute the desired estimate on x. □ 

Finally, by using the condenser of Theorem 12 in the above theorem, we immediately obtain 
Theorem 1 as a corollary, which is restated below. 

Theorem 1 (restated). For every fixed constant a > 0, there is a deterministic algorithm as 
follows. Let N = 2"^ and k < N be positive integers. Then, given (non-adaptive) query aecess to 
any x G where each coefficient of x is bits long, the algorithm runs in time and 

outputs X G that satisfies (2) (where x = DHT(x )) for some absolute eonstant c > 0. 

4.2 The sparse recovery algorithm 

The claim of Theorem 19 is shown using the algorithm presented in Figure 1. In this algorithm, 
M' is the D2'^{n-\-l)x N formed by stacking M on top oiM^B and the algorithm is given y := M'x 
for a vector x G to be approximated. For each t G [D], we define the 2'' x N matrix M* to be 
the adjacency matrix of the bipartite graph G' associated with ht (according to Dehnition 7). For 
6 G [n] we let Bj, G {0,1}^^'^ be the 6 th row of B. We assume that the entries of y are indexed by 
the set F 2 X [D] x {0,... ,n} where the entry (a,t,0) corresponds to the inner product defined by 
the ath row of M* and the entry {a,t, b) (for 6 7 ^ 0) corresponds to the ath row of M* ( 8 ) B^. Since 
each entry of x is bits long, by using appropriate scaling we can without loss of generality 
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SEARCH(j G G [F)],S G M) 


1 

for 6 = 1 to n 


2 

if \y^'^’\j)\ > |y*’*’°(j)|/2 


3 

Ub = 1. 


4 

else 


5 

o 

II 


6 

return (ui, ..., u„). 


ESTIMATE(t G [D], S G M) 


1 

Initialize S C F 2 as 5 = 0. 


2 

Initialize G as = 0. 


3 

Let T C F 2 be the set of coordinate positions corresponding to 



the largest 2k entries of y^’^’^. 


4 

for j GT 


5 

U = SEARCH(j, t, s). 


6 

if h{u, t) G T 


7 

S = 5u{u}. 


8 

A*'*(u) = 


9 

return A® *. 


RECOVER(y G So G M) 


1 

s = 0. 


2 

Let Bi ,..., Bn G {0, be the rows of the bit selection matrix B. 

3 

Initialize x^ G as = 0. 


4 

for {t, b,j) G [D] X {0,..., n} X F^ 


5 

y°’*’ (j) = y{j,t,b). 


6 

repeat 


7 

for t G [D] 


8 

yS,tfl ^ M^-{x-x^) gM^". 


9 

for b G [n] 


10 

ys,t,b ^ ^ g 1 ^ 2 -^ 


11 

A^.i = ESTIMATE(t, s). 


12 

Let to be the choice of t G [D] that minimizes \\Mx - 

M(x* + A"’*)||i. 

13 

= Hk{x^ -|- A^’*“). 


14 

s = s -1- 1. 


15 

until s = So. 


16 

Set x* to be the choice of x^ (for s = 0,... , so) that minimizes 

\\Mx — Mx^Wi- 

17 

return x*. 



Figure 1: Pseudo-code for the reconstruction algorithm RECOVER(y, sq), where y is the sketch M'x 
and So specifies the desired number of iterations. It suffices to set sq = according to the bit 

length of X. Notation is explained in Section 4.2. 
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assume that x has integer entries in range [—L,+L] for some L such that logL = 'nP^^\ and the 
algorithm’s output can be rounded to the nearest integer in each coordinate so as to make sure 
that the hnal output is integral. 

The main ingredient of the analysis is the following lemma which is proved in Appendix 4.5. 

Lemma 21. For every constant 7 > 0, there is an eo only depending on 7 such that if e < cq the 
following holds. Suppose that for some s, 

\\x - x^lli > C\\x - Hk{x)\\i 

for C = 1/e. Then, there is at ^ [D] such that 

\\x — {x^ + A^’*)||i < 7 ||x — x^lli. 

The above lemma can be used, in conjunction with the fact that M satisfies RIP-1, to show 
that if e is a sufficiently small constant, we can ensure exponential progress ||x — < ||x — 

x*||i/2 (shown in Corollary 27) until the approximation error ||x — x^||i reaches the desired level of 
C\\x — Hk{x)\\i (after the final truncation). Then it easily follows that sq = log(AL)-|-0(1) = 
iterations would suffice to deduce Theorem 19. Formal proof of Theorem 19 appears in Section 4.4. 

4.3 Analysis of the running time 

In order to analyze the running time of the procedure Recovery, we hrst observe that all the 
estimates are /c-sparse vectors and can be represented in time 0(k(logn -j- logL)) by 

only listing the positions and values of their non-zero entries. In this section we assume that all 
sparse A-dimensional vectors are represented in such a way. We observe the following. 

Proposition 22. Let w € he k-sparse. Then, for any t € [D], the products (M* B) ■ w and 
M^w can be computed in time n^^^\k + 2'^)l, assuming each entry of w is represented within i bits 
of precision. 

Proof. Let Bi,... ,Bn G {0, ip^-^ be the rows of the bit selection matrix B. Observe that each 
column of is entirely zero except for a single 1 (this is because M* represents the truth table of 
the function ht). The product M* • w is simply the addition of at most k such 1-sparse vectors, and 
thus, is itself /c-sparse. The nonzero entries of M* • w along with their values can thus be computed 
by querying the function ht in up to k points (corresponding to the support of w) followed by k real 
additions. Since ht can be computed in polynomial time in n, we see that M* • w can be computed 
in time n'^^^'>{k + 2^)£ (we may assume that the product is represented trivially as an array of length 
2'’ and thus it takes 2^ additional operations to initialize the result vector). The claim then follows 
once we observe that for every 6 G [n], the matrix M* ( 8 ) Bt, is even more sparse than M*. □ 
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Observe that the procedure Search needs 0{n) operations. In procedure Estimate, identifying 
T takes 0(2^) time, and the loop runs for 2k iterations, each taking 0{nk) time. In procedure 
Recover, we note that for all t, as well as (M* 0 Bj,)x for all b € [n] is given as a part of y at 
the input. Moreover, all the vectors x® and are 0(A:)-sparse. Thus in light of Proposition 22 
and noting that L = and the fact that /i is a lossless condenser (which implies 2'’ = R(A;)), we 

see that computation of each product in Lines 8 and 10 of procedure Recover takes time . 

Since the for loop runs for D iterations and so is the number of iterations, the running time of 
the loop is 'nP^^^D2'^. With a similar reasoning, the computation in Line 12 takes time 0^2"^. 
Similarly, computation of the product in Line 16 of procedure Recover takes time 02^^sq. 
Altogether, recalling that sq = log{NL) + 0(1) = the total running time is D'^2^. 

4.4 Proof of Theorem 19 

Theorem 19 is proved using the algorithm presented in Figure 1 and discussed in Section 4.2. We 
aim to set up the algorithm so that it outputs a fc-sparse estimate x G satisfying (2). Instead 
of achieving this goal, we hrst consider the following slightly different estimate 

\\x - x||i < C\\x - Hk{x)\\i + i^||x||i, (4) 

for an absolutate constant O > 0, where > 0 is an arbitrarily small “relative error” parameter. 
Let us show that this alternative guarantee implies (2), after rounding the estimate obtained by 
the procedure Recover to the nearest integer vector. Recall that without loss of generality (by 
using appropriate scaling), we can assume that x has integer coordinates in range [— L,+L], for 
some L satisfying logL = 

Proposition 23. Let x G be an integer vector with integer coordinates in range [—L, +L], and 
X G be so that (4) holds for some v < 1/(4AL). Let x' be the vector obtained by rounding each 
entry of x to the nearest integer. Then, x' satisfies 

\\x' — a^lli < (36* + 1/2) • ||x — Hpx)\\i. 

Proof. If X = 0, there is nothing to show. Thus we consider two cases. 

Case 1: \\x — Hk (x)||i = 0. In this case, since ||x||i < NL, we see that ||x —x||i < 1/4. Therefore, 
rounding x to the nearest integer vector would exactly recover x. 

Case 2: ||x — Hk{x)\\i > 0. Since x is an integer vector, we have ||x — Hk{x)\\i > 1. Therefore, 
again noting that ||x||i < NL, from (4) we see that 

\\x - x||i < (C + 1/4) • ||x - Hk{x)\\i. 
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Therefore, by an averaging argument, the number of the coordinate positions at which x is different 
from X by 1/2 or more is at most 2(C + 1/4) • ||x — Hk{x)\\i. Since rounding can only cause error 
at such positions, and by at most 1 per coordinate, the added error caused by rounding would be 
at most 2(6* + 1/4) • ||x — Hk{x)\\i, and the claim follows. □ 

In light of Proposition 23 above, in the sequel we focus on achieving (4), for a general and 
will finally choose u := 1/{4:NL) so that using Proposition 23 we can attain the original estimate 
in (2). We remark that Proposition 23 is the only place in the proof that assumes finite precision 
for X and we do not need such an assumption for achieving (4). 

A key ingredient of the analysis is the following result (Lemma 25 below) shown in [2]. Before 
presenting the result, we define the following notation. 

Definition 24. Let w = {wi, ..., wn) G be any vector and G be any bipartite graph with left 
vertex set [N] and edge set E. Then, First(G, tc) denotes the following subset of edges: 

First(G,t(;) := {e = {i,j) G E \ (Ve' = (i',j) G E) : (Ircjl > |rt;j/|) V = \wi/\ M' > z)}. 

Lemma 25. [2] Let G be a {k', e)-unbalanced expander graph with left vertex set [A^] and edge set 
E. Then, for any k'-sparse vector w = {wi,... ,wn) G we have 


E 


\wi\ < e 


ii,j)€E\First[G,w) (*,i)eE 


Wi 


Intuitively, for every right vertex in G, First(G, zn) picks exactly one edge connecting the vertex 
to the left neighbor at which w has the highest magnitude (with ties broken in a consistent way), 
and Lemma 25 shows that these edges pick up most of the mass of w. 

We apply Lemma 25 to the graph G that we set to be the graph associated with the function 
h. Note that this graph is a (4/c, e)-unbalanced expander by Lemma 8 . This means that for every 
(4A:)-sparse vector w and letting E denote the edge set of G, we have 


E 


Wi\ < e 


E 


Wi 


= eDWwl 


where the last equality uses the fact that G is D-regular from left. By an averaging argument, and 
noting that G is obtained by taking the union of the edges of graphs G^,..., G^ (each of which 
being 1-regular from left), we get that for some t{G,w) G [D], 

Y i'^*i - (5) 

where (jgj^otes the edge set of . 

Our goal will be to show that the algorithm converges exponentially to the near-optimal solution. 
In particular, in the following we show that if the algorithm is still “far” from the optimal solution 
on the sth iteration, it obtains an improved approximation for the next iteration. This is made 
precise in Lemma 21, which we recall below. 
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Lemma 21. (restated) For every constant 7 > 0, there is an cq only depending on 7 such that if 
e < eo the following holds. Suppose that for some s, 


||x - x®||i > C\\x - Hk{x)\\i 
for C = 1/e. Then, there is a t € [D] such that 

Ik - (x® + A^’*)||i < 7||x - x®||i. 
The proof of Lemma 21 is deferred to Section 4.5. 
Proposition 26. Suppose x',x" G are {3k)-sparse and satisfy 

\\M{x-x')h<\\M{x-x'')\\i. 

Then, 


\x — X 


'II ^ (^ \ IT / Ml I -L “T ^ 0 ^ II ff\\ 

where Cq is the constant in Theorem 5. In particular when Cqc < 1/2, we have 

Ik “ ^ 8 |k — ^^fc(a^)||i + 3|k ~ x'Wi- 


Proof. 


k “ ^'\\i ^ 

< 

< 

< 

< 


< 


< 


< 


X - Hk{x)\\i + \\Hk{x) - x'Wi 

\\MHkix) - Mx'Wi 


k - ^^fc(a;)lli + 
k - ^^fck)lli + 
k - ^fck)||i + 
k - ^^fc(a;)||i + 
k - ^^fck)lli + 


k - ^^fck)||i + 

3 + Coe\ 


D(1 - Coe) 

IlMx - Mx'lh + llM(x - IIk(x))lli 
F(1 - Coe) 

||Mx - Mx"||i + ||M(x - IIk(x))lli 
n(l - Coe) 

||M/ffck) - Mx"||i + 2||M(X - IIk(x))lli 
n(l - Coe) 

(1 + Coe)||iLfc(x) - x"\\i + 2|k - Hk{x)\\i 
(1 - Coe) 

(1 + Coe)!!® - a;"||i + (3 + C'oe)|k - -H'fck)||i 


(1 - Coe) 

, ^ + t^’oe\|| l 7 ^ Ml , 1 + Coe II //| 


\x — x 111 


( 6 ) 


(7) 


( 8 ) 

(9) 

( 10 ) 

( 11 ) 

( 12 ) 

(13) 

(14) 

(15) 


In the above, (8), (10), (12), and (14) use the triangle inequality (after adding and subtracting 
Hk{x), Mx, MHk{x), and x inside the norms, respectively); (9) and (13) use RIP-1 of the matrix M 
(seeing that x', x", and Hk{x) are sufficiently sparse); (11) uses the assumption that ||M(x —x')||i < 
||M(x — x'')||i; (13) also uses the fact that all columns of M have Hamming weight D and thus the 
matrix cannot increase the ii norm of any vector by more than a factor D. □ 
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The following corollary is implied by Lemma 21. 

Corollary 27. For every eonstant 70 > 0, there is an eo only depending on 70 sueh that if e < eo 
the following holds. Assume eondition (6) of Lemma 21 holds. Then, 

||x — < 7o||a^ — 


Proof. Let to S [D] be the value computed in Line 12 of the procedure Recover, and t G [D] be 
the value guaranteed to exist by Lemma 21. Prom the fact that the algorithm picks to to be the 
minimizer of the quantity \\Mx — M{x^ + A®’*)||i for all t G [D], we have that 


\\Mx - M(x" + A"’*o)||i < \\Mx - M(x" + A"’*)||i. 

Note that x® is fc-sparse and A^’*° and A^’* are (2/c)-sparse. Thus we can apply Proposition 26 and 
deduce that 

III _ (x- + < (1 + - HiWIli + ■ \\x - (I* + A*-‘)||,. 

Plugging in the bound implied by Lemma 21 and ( 6 ) in the above inequality we get 


a; — + A^’*°)||i < 7 '||x — x*||i, 


where we have defined 


Now, we can write 


7' := e(^l + 


3 + C*oe\ 
I-Coe) 


7(1 + Cpe) 

(1 - Coe) • 


= 

||x-Ffc(x^ + A'°’^)||i 



< 

||x-(x^ + A*°’^)||i + || 

|x* + A‘°’* - 

Ryx^ + A*oy||i 

< 

||x-(x" + A*0’^)||i + || 

|x* + A‘°’* - 

Hk{x)\\i 

< 

2||x-(x" + A*0'^)||i + 

||x - Hk{x)\\ 

1 

< 

(2y + e)||x-x*||i. 




(16) 


(17) 

(18) 

(19) 

( 20 ) 


In the above, (17) and (19) use the triangle inequality (after adding and subtracting x^ + A*°’* 
inside the norm; (18) uses the fact that Hk{x) and 77fc(x* + A*°’^) are both fe-sparse by definition 
and Hk{x^ + A*°’®) is the best approximator of x^ + A*°’^ among all fc-sparse vectors; and ( 20 ) uses 
( 6 ) and (16). Finally, note that we can choose 7 and e small enough so that 27 ' + e < 79. □ 

For the rest of the analysis, we set e a small enough constant so that 

1. Cpe < 1/2, where Cp is the constant in Theorem 5. 

2. 7 o = 1/2, where 70 is the constant in Corollary 27. 
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Observe that for the first iteration of the algorithm, the estimation error is ||x — = ||x||i. 

By repeatedly applying the exponential decrease guaranteed by Corollary 27, we see that as long 
as So > log(3/i^), we can ensure that at some stage s < sq we attain 

llx - x®||i < C\\x - Hk{x)\\i + {u/3)\\x\\i. 

Let X* be the estimate computed in the end of procedure Recover. Recall that both x* and x^ 
are A:-sparse vectors. Thus, by Proposition 26 we see that 

||a^ — Hk{x)\\i + 3||x — x^||i < (3(7 + 8) • ||x — Hk{x)\\i + i^||x||i. 

Finally, as discussed in the beginning of the analysis, by choosing u := l/(4iVL) (and thus, sq = 
log(A^L) + (7(1) = and using Proposition 23, the analysis (and proof of Theorem 19) is 

complete. □ 


4.5 Proof of Lemma 21 

We start with some notation. Let U denote the set of the k largest (in magnitude) coefficients of x, 
and let V be the support of x^. Furthermore, we set W = U UV and z = x — x^. That is, 2 is the 
vector representing the current estimation error vector. Note that |1T| <2k and that Hk{x) = xjj- 
With a slight abuse of notation, we will use the sets {0,1}” and [N] interchangeably (for example 
in order to index coordinate positions of z) and implicitly assume the natural n-bit representation 
of integers in [N] in doing so. 

We first apply the result of Lemma 25 to the vector zw so as to conclude that, for some t G [D], 
according to (5) we have 

^ \z{i)\ < eW^wWi- (21) 

(i,j)GE‘\First{G,zw) 

i£W 

We fix one particular such choice of t for the rest of the proof. Define the set 


D := {i £ [77] I {i,ht{i)) G First(G, ^ve)}- 


Intuitively, First((7, zw) resolves collisions incurred by ht by picking, for each hash output, only the 
pre-image with the largest magnitude (according to zw)- In other words, First((7, zw) induces a 
partial function from [A^] to F 2 that is one-to-one, and D defines the domain of this partial function. 
Using (21), we thus have 


lkvy\Dlli 


\z{i)\ < e\\zw\\i < e\\z\\i. 

ieW\D 


Define, for any i E [A^], 


di 






( 22 ) 


(23) 
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Intuitively, with respect to the hash function ht , the quantity di collects all the mass from elsewhere 
that fall into the same bin as i. 

Our aim is to show that A®’* which is the estimate on the error vector produced by the algorithm 
recovers “most” of the coefficients in zw, and is therefore “close” to the actual error vector z. 

Our analysis will focus on coefficients in zw that are “good” in the following sense. Formally, 
we define the set of good coefficients Q to contain coefficients i such that: 

1. i & W Cl D, and, 

2. di < (I|z(i)l, for some small parameter <5 < 1/4 to be determined later. 

Intuitively, Q is the set of coefficients i that “dominate” their bucket mass y{ht{i)). Thus applying 
the binary search on any such bucket (i.e., procedure SEARCH(/it(i), t, s)) will return the correct 
value i (note that the above definition implies that for any i & G, we must have 7 ^ 0 , 

and thus the binary search would not degenerate). More formally, we have the following. 

Proposition 28. For any i €Q, the procedure SEARCH(/if (i), t, s) returns i. 

Proof. Consider the sequence (ui,...,u„) produced by the procedure Search and any b G [n]. 
Recall that y®4,o _ ^ g yS,t,b _ ^ j Since i GQ^we 

have di < 5\z{i)\ < \z{i)\/2. Therefore, 

|z(i)|(l-<I)<|y®’*'°(j)l<^«(l + <5). (24) 


Let b G [n] and v G {0,1} be the bth bit in the n-bit representation of i. Let S be the set of those 
elements in C {0,1}"" whose 6th bit is equal to 1. Note that i € S' iff u = 1. Recall that 


i'GS 

Whenever i ^ S, we get 

< E =<;. < 


\y‘*H3)\ = 


E =(>': 


i'&S 


according to the dehnition of di and (24). On the other hand, when i G S, we have 


> k(i)l 


E 

i'£S\{i} 


> \z{i)\ - di> \z{i)\{l - 5) > 


(1 - (5)|?/®’h0(j)| 

1 + 5 


again according to the definition of di and (24). Thus, the procedure Search will be able to 
distinguish between the two cases i G S and i ^ S (equivalently, v = 1 and v = 0) and correctly set 
Uh = V provided that 


5 


1 









and 


1-5 1 

TTs - 2 

which is true according to the choice 5 <1/A. □ 

By rewriting assumption ( 6 ) of the lemma, we know that 

and thus, 

Ikirlli = Ikirlli ^ ^ M\i/c = e||z||i, ( 25 ) 

where the first equality uses the fact that x and z = x — agree outside V = supp(x^) (and thus, 
outside W) and we also recall that W C U. 

Observe that for each i,i' € D such that i 7 ^ i', we have ht{i) / (since First(G, zw) picks 

exactly one edge adjacent to the right vertex ht{i), namely {i,ht{i)), and exactly one adjacent to 
namely {i',ht{i'))). In other words for each i ^ D, the set h'j~^{ht{i)) cannot contain any 
element of D other than i. Therefore, we have 

^ ^ di ^ ll'^^Tjlli — II-^vfxdIIi T ll-^ivlli — 2 e||z||i, (26) 

ieWnD 

where for the last inequality we have used (22) and (25). 

Now we show that a substantial portion of the ii mass of z is collected by the set of good indices 

Lemma 29. Ylieg k(OI > (1 “ 2e(l + l/(5))||2;||i. 

Proof. We will upper bound ™ order to do so, decompose this sum into three 

components bounded as follows: 

• Y.i^w 1^(01 < e|k||i (according to (25)) 

• EiGW\D k(*)l ^ ell^lli (according to ( 22 )) 

• Z)(VKnD)\g 1^(01 ^ 2 e/( 5 || 2 :||i. In order to verify this claim, observe that from the definition 
of Q, every i ^ Q satisfies \z{i)\ < di/5. Therefore, the left hand side summation is at most 
X^iGVFnD \ di\/d and the bound follows using (26). 

By adding up the above three partial summations, the claim follows. □ 

Lemma 29 shows that it suffices to recover most of the coefficients Zi for i £ ^ in order recover 
most of the ii mass in 2 ;. This is guaranteed by the following lemma. 
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Lemma 30. There is a j3 > Q only depending on e and 6 such that /3 = Os{e) and 

|z(z)| > (1-/3)||z||i, 

ieG ,ht{i)€:T 

where T is the set define in Line 3 of the procedure Estimate. 

Proof. Consider the bin vector y := = M^z. From the choice of T as the set picking the 

largest 2k coefficients of y, it follows that for all j G T \ hfiQ) and j' G hfiQ) \ T (where ht{G) 
denotes the set {ht{i) \ i G Q}) we have \y{j)\ > \y{j')\. Since |T| = 2k and \ht{G)\ < 2k (because 
Q CW which is in turn (2A:)-sparse), it follows that |T\ ht{Q)\ > \ht{Q) \F|. Therefore, 

E ^ E IsO)!- 

j&ht{g)\T jeT\ht{g) 

Now, using Lemma 29 we can deduce the following. 

j&T\ht(g) iig 

where for the first inequality we note that y(j) = that the sets h~[^{j) for various 

j are disjoint and cannot intersect Q unless, by definition, j G hfiG). 

Recall that for every i G G,i>y the definition of G we have 


(1 - <5)lz(i)| < |y(ht(z))| < (1 + 5)|2(i)|. 


Using this, it follows that 


Y - 

*ee,/it(i)eT 


l+i 


E i!'0)i 


j€ht{g)nT 


> 


> 


> 


1 + S 


1 

TTd 


E E i!'0)i 


viGfet(e) 


jeht{g)\T 


(i-5)Y\4i)\- Y iy(j)i 


i&g jeht{g)\T 

(1 - 6){l - 2e(l + 1/5)) - 2e(l + 1/5) 
1 + 5 


kill =: (1 - 


^lli> 


where the last step follows from Lemma 29 and (27). □ 

We are now ready to conclude the proof of Lemma 21. First, observe using Proposition 28 that 
for coordinates i G G such that ht{i) G T, we have A®’*(i) = y{ht{i)) and that, since i G G, 

z[i){l - 5) < z{i) - di < y{ht{i)) < z{i) + di < z{i){l + 5). (28) 
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Therefore, for such choices of i, |A^’*(i) — z{i)\ < (5|2:(i)|. Thus we have 


z\\l = 


E 

|A*’*(i 


(i)|+ ^ |A*'‘(i)-2(i)| 

(29) 


iegnh-\T) 


iigr\h-^{T) 


< 

5|| 

^lli + 

E 

|A^’*( 

i) - m + E 

(30) 






i&K\T)\g 


= 

5|| 

^lli + 

E 

W)\ 






iih-\T) 


ieh-\T)\g 


< 

<511 

^lli + 

E 

W)\ 

+ Y. + i*«i) 





iih-\T) 


ieh-\T)\g 


= 

<511 

z\\i + 

E 

\z{. 

01 + E (|A'-‘(!)I + N(i)l) 





Hh-^{T)ng 

i&h-\T)\g 


< 


+ /3)ll 

^lli + 

E 


(31) 


i&h-\T)\g 


In the above, (30) uses (28) and (31) uses Lemma 30. Now, for each i S h~[^{T) \ Q such that 
I A^’*(i)| / 0, the algorithm by construction sets A*’*(i) = y^’^’^{ht{i)) = Ylij^h-^(ht{i)) Observe 
that in this case, we must have {ht{i))f^Q = 0. This is because if there is some i' € , 

the for loop in procedure Estimate upon processing the element ht{i) = ht{i') in the set T would 
call SEARCii{ht{i'),t, s) which would return i' rather than i according to Proposition 28 (since 
i' G Q), making the algorithm estimate the value of A^’*(i) and leave A®’*(i') zero. Therefore, 

E iA*’‘wi<Ei"(')is/3ikiii. 

i&h-^{T)\g iiQ 

the last inequality being true according to Lemma 29. Plugging this result back into (30), we get 
that 

\\x - (x"" + A""’*)!!! = IIA""’* - z\\i < ((5 + 2/3)||z||i = (5 + 2/3)||x - x^||i. 

The proof of Lemma 21 is now complete by choosing 5 and e (thus /?) small enough constants so 
that 5 + 2/3 < 7 . 

□ 


5 Speeding up the algorithm using randomness 

Although this work focuses on deterministic algorithms for sparse Hadamard transform, in this 
section we show that our algorithm in Figure 1 can be significantly sped up by using randomness 
(yet preserving non-adaptivity). 

The main intuition is straightforward: In the for loop of Line 7, in fact most choices of t turn 
out to be equally useful for improving the approximation error of the algorithm. Thus, instead of 
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trying all possibilities of t, it suffices to just pick one random choice. However, since the error e of 
the condenser is a constant, the “success probability” of picking a random t has to be amplified. 
This can be achieved by either 1) Designing the error of condenser small enough to begin with; or, 
2) Picking a few independent random choices of t and trying each such choice, and then estimating 
the choice that leads to the best improvements. It turns out that the former option can be rather 
wasteful in that it may increase the output length of the condenser (an subsequently, the overall 
sample complexity and running time) by a substantial factor. In this section, we pursue the second 
approach which leads to nearly optimal results. 

In this section, we consider a revised algorithm that 

• Instead of looping over all choices of t in Line 7 of procedure Recover, just runs the loop 
over a few random choices of t . 

• In Line 17, instead of minimizing \\Mx — Mx^\\i, performs the minimization with respect to 
a randomly sub-sampled submatrix of M obtained from restriction M to a few random and 
independent choices of t. 

The above randomized version of procedure Recover is called procedure Recover' in the 
sequel, and is depicted in Figure 2. The algorithm chooses an integer parameter q which determines 
the needed number of samples for t. In the algorithm, we use the notation M^, where T Q [D] 
is a multi-set, to denote the 1X12” x N matrix obtained by stacking matrices M' for all t £ T on 
top of one another. Note that the algorithm repeatedly uses fresh samples of t as it proceeds. This 
eliminates possible dependencies as the algorithm proceeds and simplifies the analysis. 

More formally, our goal in this section is to prove the following randomized analogue of Theo¬ 
rem 19. Since the running time of the randomized algorithm may in general be less than the sketch 
length (2”D(n -|- 1)), we assume that the randomized algorithm receives the sketch implicitly and 
has query access to this vector. 

Theorem 31. (Analogue of Theorem 19) There are absolute constants c > 0 and e' > 0 such that 
the following holds. Let k,n (k < n) be positive integer parameters, and suppose there exists a 
function h: x [D] —>■ F 2 (where r < n) computable in time f{n) (where f(n) = Ll{n)) which 

is an explicit {[og{4:k),e')-lossless condenser. Let M be the adjacency matrix of the bipartite graph 
associated with h and B be the bit-selection matrix with n rows and N := 2” columns. Then, there 
is a randomized algorithm that, given k, n, parameters r],v > 0, and query access to the vectors 
Mx and (M ® B)x for some x £ M.^ (which is not given to the algorithm), computes a k-sparse 
estimate x such that, with probability at least 1 — g over the random coin tosses of the algorithm, 

||5; — x||i<c-||x — Hk{x)\\i -|- ;z||x||i. 

Moreover, execution of the algorithm takes 0(2” • log(log(l/z/)/77) • log(l/z/)/(re)) arithmetic opera¬ 
tions in the worst case. 
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RECOVER'(y, So,g) 


1 

2 

3 

4 

5 

6 

7 

8 
9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 


s = 0. 

Let i?i,..., Bn G {0, be the rows of the bit selection matrix B. 

Initialize G as = 0. 
for {t, b,j) G [D] X {0,..., n} X F 2 
y0,t,6(j) = y{j,t,b). 

repeat 

Let T® G [D] be a multiset of q uniformly and independently random elements. 

for t 

yS,t,o ^ -{x-x^) gM^". 

for b G [n] 

ys,t,b ^ ^ ^ 

= ESTIMATE(t, s). 

Let G [D] be a multiset of q uniformly and independently random elements. 
Let to be the choice of t G that minimizes x — {x^ + A^’*)||i. 

+ A^’^o). 

s = s + 1. 

until s = So- 

Let T” G [D] be a multiset of q uniformly and independently random elements. 

Set X* to be the choice of x® (for s = 0,..., sq) that minimizes x — x®||i. 

return x*. 


Figure 2: Pseudo-code for the randomized version of the algorithm Recover. The algorithm 
receives y implicitly and only queries y at a subset of the positions. The additional integer parameter 
q is set up by the analysis. 
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Proof of Theorem 31 is deferred to Section 5.1. In the sequel, we instantiate this theorem for 
use in sparse Hadamard transform application. Specifically, we consider the additional effect on 
the running time incurred by the initial sampling stage; that is, computation of the input to the 
algorithm in Figure 2 from the information provided in a; = DHT(a;). 

First, notice that all the coin tosses of the algorithm in Figure 2 (namely, the sets T^,..., 

and r") can be performed when the algorithm starts, due to the fact that each 
random sample t € [D] is distributed uniformly and independently of the algorithm’s input and 
other random choices. Therefore, the sampling stage needs to compute M^x and (M* (8) B)x for all 
the (2so + 1)(? random choice of t made by the algorithm. 

For t G [D], let Vt be the (n — r)-dimensional subspace of F 2 which is the kernel of the linear 
function hf. Moreover, let and Wt respectively denote the dual and complement of Vt (as in 
Lemma 15). As discussed in the proof of Theorem 20, for each t G [D], we can use Lemma 9 to 
compute of M^x from query access to x = DHT(x) at 0(2^r) points and using 0{2^rn) arithmetic 
operations, assuming that a basis for !/)■*■ and Wt is known. Similarly, ® B)x may be computed 
using operations and by querying x at 0{2^rn) points. 

Computation of a basis for V)-*- and Wt for a given t can in general be performed^ using Gaus¬ 
sian elimination in time O(n^). Therefore, the additional time for the pre-processing needed for 
computation of such bases for all choices of t picked by the algorithm is O(qson^)- 

Altogether, we see that the pre-processing stage in total takes 

O(qso(2^r -|- n^)n^) = 0(log(log(l/iy)/rf) ■ log(l/i/) • (2'’r -|- n^)n^) 
arithmetic operations. 

Finally we instantiate the randomized sparse DHT algorithm using Theorem 31, pre-processing 
discussed above, and the lossless condensers constructed by the Leftover Hash Lemma (Lemma 41). 
As for the linear family of hash functions required by the Leftover Hash Lemma, we use the linear 
family 'Hun which is defined in Section 4.3. Informally, a hash function in this family corresponds 
to an element (3 of the finite field F 2 ". Given an input x, the function interprets x as an element of 
F 2 ’^ and then outputs the bit representation of /3-x truncated to the desired r bits. We remark that 
the condenser obtained in this way is computable in time /(re) = O(relogre) using the FFT-based 
multiplication algorithm over F 2 'i. Simplicity of this condenser and mild hidden constants in the 
asymptotics is particularly appealing for practical applications. 

Recall that for the Leftover Hash Lemma, we have 2'" = Oikjd'^') = 0{k), which is asymptot¬ 
ically optimal. Using this in the above running time estimate, we see that the final randomized 

® For structured transformations it is possible to do better (see [14]). This is the case for the specific case of Leftover 
Hash Lemma that we will later use in this section. However, we do not attempt to optimize this compntation since 
it only incurs an additive poly-logarithmic factor in N which affects the asymptotic running time only for very small 
k. 


28 



version of the sparse DHT algorithm performs 

0 (log(log(l/i/)/r/) • log(l/i/) • (/clogA: + n^) • n^) 

arithmetic operations in the worst case to succeed with probability at least 1 — 77 . 

Finally, by recalling that an algorithm that computes an estimate satisfying (4) can be trans¬ 
formed into one satisfying (2) using Proposition 23, we conclude the final result of this section (and 
Theorem 3) that follows from the above discussion combined with Theorem 31. 

Corollary 32 (Generalization of Theorem 3). There is a randomized algorithm that, given integers 
k,n (where k < n), parameters rj > 0 and u > 0, and (non-adaptive) query access to any x G 
(where N ;= 2”j, outputs x G that, with probability at least l — rj over the internal random coin 
tosses of the algorithm, satisfies 


||5; — x||i < c\\x — Hk{x)\\i + i^||a^||i, 

for some absolute eonstant c > 0 and x = DHT(x). Moreover, the algorithm performs a worse-case 

0 (log(log(l/zz)/r 7 ) • \og{l/v) ■ (/clog/c -|- n^) ■ n^) 

arithmetie operations^ to compute x. Finally, when each coefficient of x takes 0(n) bits to represent, 
the algorithm can be set up to output x satisfying 


||x - xlli < c\\x - Hk{x)\\i, 

using 0{\og{n/rf)-{k log k+n^)-n^) arithmetie operations in the worst case. In particular, whenrj = 
and k = f2(n^) = n(log^ A^), the algorithm runs in worse case time O{kn^(log k)(log n)) = 
d(k{logNf). □ 

5.1 Proof of Theorem 31 

The proof is quite similar to the proof of Theorem 19, and therefore, in this section we describe the 
necessary modifications to the proof of Theorem 19 which lead to the conclusion of Theorem 31. 

5.1.1 Correctness analysis of the randomized sparse recovery algorithm 

Similar to the proof of Theorem 19, our goal is to set up the randomized algorithm so that, given 
arbitrarily small parameters zz, ?y > 0, it outputs a /c-sparse estimate x G that at least with 
probability 1 — i] (over the random coin tosses of the algorithm) satisfies (4), recalled below, for an 
absolute constant C > 0: 

||x - xlli < C\\x - Hk{x)\\i izllxlli, 

®We remark that the running time estimate counts 0{n) operations for indexing; that is, looking for x{i) for an 
index i € [N], and one operation for writing down the result. 


29 



As in the proof of Theorem 19 and using Proposition 23, once we have such a guarantee for some 
V = 0(1/(A^L)), assuming that x has integer coordinates in range [—L,+L] and by rounding the 
final result vector to the nearest integer vector we get the guarantee in ( 2 ). 

We will also use the following “error amplification” result that can be simply proved using 
standard concentration results. 

Lemma 33. Suppose h\ Fg x [D] ¥2 is a {K,e)-lossless condenser. For any set S C F 2 where 

I S'! < 2'^ the following holds. Let q € IS be a parameter and ti,...,tq be drawn uniformly and 
independently at random. Let h'-.Wtfx [( 7 ] —)• F 2 he defined as h'{x,j) := h{x,tj), and G be the 
bipartite graph associated with h'. Let T C F 2 be the neighborhood of the set of left vertices of G 
defined by S. Then, with probability at least 1 — exp(—e^g/4) (over the randomness of ti,... ,tq), 
we have \T\ > (1 — 2 e)( 7 |S'|. 

Proof. Let G^ be the bipartite graph associated with h, with N left vertices and 112” right vertices, 
and for each t G [Zl], denote by G* the bipartite graph associated with ht, each having N left 
vertices and 2” right vertices. Recall that G^ contains the union of the edge set of G^,... ,G^ 
(with shared left vertex set [A^] and disjoint right vertex sets), and that G contains the union of the 
edge set of ,..., Let be the set of right neighbors of S in G'^. Similarly, let T* (t G [D]) 
be the set of right neighbors of S' in G*. 

Since h is a lossless condenser, we know that |T°| > (1 — e)Il|S|. For z G [g], let Xi G [0,1] 
be such that |T®| = (1 — Aj)|S|, and define X := Ai + • • • + Xq. By an averaging argument, we 
see that E[Aj] < e. Moreover, the random variables Xi,... ,Xq are independent. Therefore, by a 
Chernoff bound, 

Pr[A > 2€q] < exp(—e^g/4). 

The claim follows after observing that jrj = (<7 —A)|S| (since the graph G is composed of the union 
of G^,..., G'^ with disjoint right vertex sets). □ 

Note that the above lemma requires the set S to be determined and fixed before the random 
seeds ti,... ,tq are drawn. Thus the lemma makes no claim about the case where an adversary 
chooses S based on the outcomes of the random seeds. 

In the sequel, we set the error of the randomness condenser (that we shall denote by e') to be 
e' < e/2, where e is the constant from Theorem 20. 

We observe that the result reported in Lemma 25 only uses the expansion property of the 
underlying bipartite graph with respect to the particular support of the vector w. Thus, assuming 
that the conclusion of Lemma 33 holds for the set S in the lemma set to be the support of a 
k'-sparse vector w (where in our case k' = 4/c), we may use the conclusion of Lemma 25 that, for 
some t G {G,..., tq], 


E 

(j,t)G£'^\First(G,'U)) 


e tc h. 
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Using the above observation, we can deduce an analogue of the result of Lemma 21 for the 
randomized case by noting that the result in Lemma 21 holds as long as the set W in the proof 
of this lemma satisfies ( 21 ). Since the choice of W only depends on the previous iterations of the 
algorithm; that is the algorithm’s input and random coin tosses determining T^,... we can 

use Lemma 33 to ensure that (21) holds with high probability. In other words, we can rephrase 
Lemma 21 as follows. 


Lemma 34. (Analogue of Lemma 21) For every constant 7 > 0, there is an cq and U > 0 only 
depending on 7 such that if e' < cq the following holds. Suppose that for some s, 

\\x - x'^Wi > C\\x - Hk{x)\\i. (32) 

Then, with probability at least 1 — exp(—e'^g/d), there is a t such that 

\\x — {x^ + A^’*)||i < 7||x — x^lli. 


□ 


Declare a bad event at stage s if we have the condition ||x — x®||i > C\\x — Hk{x)\\i however the 
conclusion of the lemma does not hold because of unfortunate random coin tosses by the algorithm. 
By a union bound, we see that the probability that any such bad event happens throughout the 
algorithm is at most sq exp(—e'^g/4). 

Next we show an analogue of Proposition 26 for the randomized algorithm. 

Proposition 35. Let x',x" G be fixed {3k)-sparse vectors and T be a multi-set of q elements 
in [D] chosen uniformly and independently at random. Moreover, assume 

||M'^(a;-a;')||i < ||M^(x - x")||i. 

Then, with probability at least 1 — 2 exp(—e'^g/4) over the choice ofT, we have 

Ik - x'lli ^ (1 + II® “ Hk{x)\\i + I ^ • Ik - a;"l|i 

where Cq is the constant in Theorem 5. In particular when Cqc < 1/2, we have (with the above- 
mentioned probability bound) 

Ik “ k||i < 8|k — Hk{x)\\i + 3|k “ k'lli- 

Proof. Proof is the same as the original proof of Proposition 26. The only difference is observing 
that the argument is valid provided that the RIP-1 condition holds for two particular (4A;)-sparse 
vectors Hk{x) — x" and Hk{x) — x' (as used in (9) and (13)). On the other hand, the proof of 
Theorem 5 only uses the expansion property of the underlying expander graph for the particular 
support of the sparse vector being considered, and holds as long as the expansion is satisfied for 
this particular choice. By applying Lemma 33 twice on the supports of Hk{x) — x" and Hk{x) — x', 
and taking a union bound, we see that the required expansion is available with probability at least 
1 — 2 exp(—e'^( 7 / 4 ), and thus the claim follows. □ 
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Using the above tool, we can now show an analogue of Corollary 27; that is, 

Corollary 36. For every constant 70 > 0, there is an eo only depending on 70 such that if e < cq 
the following holds. Assume condition (32) of Lemma 34 holds. Then, with probability at least 
1 — 2 exp(—e'^(;/4) over the choice ofT'^, we have 

||a; — < 7o||a; — 

Proof. The proof is essentially the same as the proof of Corollary 27. The only difference is that 
instead of \\Mx — M{x^ + the quantity \\M'^'‘’x — {x^ + A^’*)||i that is used in the 

randomized algorithm is considered, and Proposition 35 is used instead of Proposition 26. In order 
to ensure that we can use Proposition 35, we use the fact that particular choices of the vectors x' 
and x" that we instantiate Proposition 35 with (respectively, the vectors x^ + A^’*° and x^ + A^’* in 
the proof of Corollary 27) only depend on the algorithm’s input and random coin tosses determining 
T°,..., T" and r'°,..., and not on T'T □ 

Again, declare a bad event at stage s if we have the condition ||x — x^||i > C\\x — Hk{x)\\i 
however the conclusion of Corollary 36 does not hold because of unfortunate coin tosses over the 
choice of T'^. Same as before, by a union bound we can see that the probability that any such bad 
event happens throughout the algorithm is at most 2 so exp(—e'^g/4). 

Since the initial approximation is = 0 (with error at most ||x||), assuming 70 < 1 / 2 , we 
have that for some s < log(l/z/) the condition (4) is satisfied provided that a bad event does not 
happen in the first s iterations. By the above union bounds, this is the case with probability at 
least 1 — 3so exp(—e'^g/4). 

Let x* be the estimate computed in Line 17 of procedure RECOVERh We can conclude the 
analysis in a similar way to the proof of Theorem 19 by one final use of Proposition 35 as follows. 
By Proposition 35, assuming no bad event ever occurs, with probability at least 1 — 2 exp(—e'^q/d) 
we see that 


— x*||i < 8 ||x — Hk{x)\\i + 3||x — x®||i < C' ■ \\x — Hk{x)\\i + i^||x||i, (33) 

where we define C" := 3(7 + 8 . 

Altogether, by a final union bound we conclude that the desired (33) holds with probability at 
least 1 — 7 for some choice of 7 = 0(log{so/T])/e''^) = O{log{so/T])). 

5.1.2 Analysis of the running time of the randomized sparse recovery algorithm 

The analysis of the running time of procedure recover' in Figure 2 is similar to Section 4.3. As 
written in Figure 2, the algorithm may not achieve the promised running time since the sketch 
length may itself be larger than the desired running time. Thus we point out that the sketch 
is implicitly given to the algorithm as an oracle and the algorithm queries the sketch as needed 
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throughout its execution. Same holds for the initialization step in Line 4 of procedure Recover', 
which need not be performed explicitly by the algorithm. 

In order to optimize time, the algorithm stores vectors in sparse representation; i.e., maintaining 
support of the vector along with the values at corresponding positions. 

As discussed in Section 4.3, each invocation of procedure Search takes 0{n) arithmetic opera¬ 
tions, and procedure Estimate takes 0(r2^ -\-kf{n)) = 0(2'’/(n)) operations (using naive sorting 
to find the largest coefficients and noting that 2 '’ > /c and f{n) = n(n) = n(r’)). 

We observe that for every k-spavse w G and t G [D], computing the multiplication M' • k 
(which itself would be a A:-sparse vector) takes 0{kf{n)) operations {k invocations of the condenser 
function, once for each nonzero entry of tc, each time adding the corresponding entry of w to the 
correct position in the result vector). Note that the indexing time for updating an entry of the 
resulting vector is logarithmic in its length, which would be r < n and thus the required indexing 
time is absorbed into the above asymptotic since /(n) = 12(n). Moreover, we observe that without 
an effect in the above running time, we can in fact compute (M' ® B) -w; since for each i G [A^] on 
the support of w, the corresponding w{i) is added to a subset of the copies of M* depending on the 
bit representation of i and thus the additional computation per entry on the support of w is 0{n), 
which is absorbed in the time f{n) = 12(n) needed to compute the condenser function. Altogether 
we see that computing (M' ® B) ■ k can be done with 0{kf{n)) arithmetic operations. 

Since procedure Recover' loops q times instead of D times in each of the sq iterations, each 
iteration taking time 0 ( 2 '’/(n)), we see that the algorithm requires 0{2'^qsQf{n)) arithmetic oper¬ 
ations in total. Now we can plug in the values of q and sq by the analysis in the previous section 
and upper bound the number of operations performed by the algorithm by 

0(2'’ • log(log(l/z/)/ 7 ?) • log(l/z/)/(n)). 

This completes the running time analysis of the algorithm in Figure 2. 
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A Proof of Theorem 12 (construction of the lossless condenser) 

In this appendix, we include a proof of Theorem 12 from [3]. The first step is to recall the original 
framwork for construction of lossless condensers in [ 8 ] which is depicted in Construction 1. The 
construction is defined with respect to a prime power alphabet size q and integer parameter u > 1 . 

• Given: A random sample A ~ A, where A is a distribution on F” with min-entropy at least 
K, and a uniformly distributed random seed Z ~ over Fg. 

• Output: A vector C{X, Z) of length i over Fg. 

• Construction: Take any irreducible univariate polynomial g of degree n over Fg, and interpret 
the input X as the coefficient vector of a random univariate polynomial F of degree n — 1 
over Fg. Then, for an integer parameter u, the output is given by 

C{X,Z) := (F(Z),Fi(Z),...,F,_i(A)), 

where we have used the shorthand Fi := mod g. 

Construction 1: Guruswami-Umans-Vadhan’s Condenser C: F” x Fg — )• F^. 

The following key result about Construction 1 is proved in [ 8 ]: 

Theorem 37. [8] For any k > 0, the mapping defined in Construction 1 is a {k, e) lossless con¬ 
denser with error e := (n — l)(u — l)ilq, provided that i > n/logu. 

By a careful choice of the parameters, the condenser can be made linear as observed by Cher- 
aghchi [3]. We quote this result, which is a restatement of Theorem 12, below. 

Corollary 38. [3] Let p be a fixed prime power and a > 0 be an arbitrary constant. Then, for 
parameters n G IN, k < n\ogp, and e > 0, there is an explicit {K,e)-lossless condenser h: F” x 
{0, l}'^ —)> Fp with d < (l + l/a)(log(nK/e)+ 0 (l)) and output length satisfying r logp < d+(l+a)K. 
Moreover, h is a linear function (overFp) for every fixed choice of the second parameter. 

Proof. We set up the parameters of the condenser C given by Construction 1 and apply Theorem 37. 
The range of the parameters is mostly similar to what chosen in the original result of Guruswami 
et al. [ 8 ]. 

Letting uq := {flgfinKje')^l°^, we take u to be an integer power of p in range [uQ,puo]. Also, let 
£ := \K/\ogu~\ so that the condition I > K/\ogu required by Theorem 37 is satisfied. Finally, let 
<70 := nutje and choose the field size q to be an integer power of p in range [qo,pqo]. 
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We choose the input length of the condenser C to be equal to n. Note that C is defined over 
Fg, and we need a condenser over Fp. Since g is a power of p, Fp is a subfield of Fg. For x G F” 
and 2 ; G {0,1}'^, let y := C{x,y) G F^, where x is regarded as a vector over the extension Fg of 
Fp. We define the output of the condenser h{x, z) to be the vector y regarded as a vector of length 
llogpq over Fp (by expanding each element of Fg as a vector of length logpQ over Fp). Clearly, h 
is a (k, e)-condenser if C is. 

By Theorem 37, C is a lossless condenser with error upper bounded by 

(n — l)(ri — \)i ^ nu^ 

Q ~ qo ' 

It remains to analyze the seed length d and the output length r of the condenser. For the output 
length of the condenser, we have 

rlogp = .^logg < (1 + n/logu) \ogq < d + «:(log( 7 )/(logu), 

where the last inequality is due to the fact that we have d = \ log q]. Thus in order to show the 
desired upper bound on the output length, it suffices to show that logg < (1 + a) logtto- We have 

log (7 < log(pgo) = log(pnu^/e) < logrxo + log{p'^n£/e) 

and our task is reduced to showing that p^nlje < Uq = 2p^nK/e. But this bound is obviously valid 
by the choice oi £ <1 + k/ log u. 

Now, d = [logg] for which we have 

d < log O'+ 1 < log go + 0 ( 1 ) 

< log(ntto^/e) + 0 ( 1 ) 

< log(nttoK/e) + 0 ( 1 ) 

< log(n«;/e) H— \og{2p^nK/e) 

a 

< (1 + -) (log(nK/e) + 0 ( 1 )) 

a 

as desired. 

Since Fg has a fixed characteristic, an efficient deterministic algorithm for representation and 
manipulation of the field elements is available [17] which implies that the condenser is polynomial¬ 
time computable and is thus explicit. 

Moreover, since u is taken as an integer power of p and Fg is an extension of Fp, for any choice 
of polynomials F,F',G G Fg[X], subfield elements a,b G Fp, and integer i > 0, we have 

{aF + bFY = aF'^^ + bF''^^ (mod G), 

meaning that raising a polynomial to power n* is an Fp-linear operation. Therefore, the mapping 
C that defines the condenser (Construction 1) is Fp-linear for every fixed seed. This in turn implies 
that the final condenser h is linear, as claimed. □ 
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B The Leftover Hash Lemma 


Leftover Hash Lemma (first stated by Impagliazzo, Levin, and Luby [10]) is a basic and classical 
result in computational complexity which is normally stated in terms of randomness extractors. 
However, it is easy to observe that the same technique can be used to construct linear lossless 
condensers with optimal output length (albeit large seed length). In other words, the lemma 
shows that any universal family of hash functions can be turned into a linear extractor or lossless 
condenser. For completeness, in this section we include a proof of this fact. 

Definition 39. A family of functions T-L = {hi,..., ho] where ht : {0,1}” {0,1}*' for f = 1,..., D 

is called universal if, for every fixed choice of x, x' G {0,1}” such that x ^ x' and a uniformly random 
t G [D] D} we have 

Pr[ht(x) = ht{x')] < 2“^. 

One of the basic examples of universal hash families is what we call the linear family, defined 
as follows. Consider an arbitrary isomorphism tp: —)• '^ 2 ^ between the vector space Fg and the 

extension field F 2 ", and let 0 < r < n be an arbitrary integer. The linear family 'Hwn is the set 
(h^: (5 G F 2 >i} of size 2"' that contains a function for each element of the extension field F 2 ". For 
each /3, the mapping hp is given by 

■= {vu---,Vr), where (yi,...,y„) ■.= {13 ■ ip{x)). 

Observe that each function can be expressed as a linear mapping from F 2 to F^. Below we show 
that this family is pairwise independent. 

Proposition 40. The linear family TL\\f, defined above is universal. 

Proof. Let x,x' be different elements of F 2 ’^. Consider the mapping /: F 2 " —)■ F 2 defined as 

f{x) := {yi,...,yr), where {yi,...,yn) :=p~^{x), 

which truncates the binary representation of a field element from F 2 " to r bits. The probability 
we are trying to estimate in Definition 39 is, for a uniformly random /3 G F 2 ", 

^ ^ ^ 

But note that x — x' is a nonzero element of F 2 ", and thus, for a uniformly random (3, the random 
variable /3x is uniformly distributed on F 2 ". It follows that 

Pr [/(/3 • (x - x')) = 0] = 2-\ 

implying that TL\\„ is a universal family. □ 
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Now we are ready to state and prove the Leftover Hash Lemma (focusing on the special case of 
lossless condensers). 

Theorem 41. (Leftover Hash Lemma) Let H = {/i^; — )• F2 | t G F^} he a universal family 

of hash functions with D elements, and define the function h: F 2 x [D] —)• F 2 as h{x,t) := ht{x). 
Then, for every n,e such that r > k + 21og(l/e), the function h is a {K,e) -lossless condenser. In 
particular, by choosing H = it is possible to get explicit extractors and lossless condensers with 
D = 2". 


Proof. Recall that by Definition 6 we need to show that for any distribution X over F 2 and 
random variable X drawn from X and independent random variable Z uniformly drawn from [D], 
respectively, the distribution of h{X, Z) is e-close in statistical distance to a distribution with min- 
entropy at least k. By a convexity argument, it suffices to show the claim when X is the uniform 
distribution on a set supp(R’) of size K := 2'^ (on the other hand, we only use the lemma for such 
distributions in this paper). 

Define R := 2'’, D := 2'^, and let pL be any distribution uniformly supported on some set 
supp(/e) C [D] X F 2 such that [D] x supp(T) C supp(^), and denote by y the distribution of 
{Z,h{X,Z)) over [D] x F^. We will first upper bound the (.2 distance of the two distributions 
y and pL (i.e., the ^2 difference of probability vectors defining the two distributions), that can be 
expressed as follows (we will use the notation y{x) for the probability assigned to x by y, and 
similarly pi{x))-. 


\\y-h\\l = {y{x)-^i{x)f 

a;G[D] xFj 


(a) 






supp(/i)| |supp(^)| 
1 




|supp(/x)|’ 


(34) 


where (a) uses the fact that p, assigns probability l/|supp(/x)| to exactly |supp(^)| elements of 
[D] X F 2 and zeros elsewhere. 

Now observe that y(x)'^ is the probability that two independent samples drawn from y turn 
out to be equal to x, and thus, YL^y^xY is the collision probability of two independent samples 
from y, which can be written as 

= J(Z,h(X,Z)) = (Z',h(X',Z'))], 

Zj .Zi' .Z. .JC 


where the random variables Z, Z' are uniformly and independently sampled from [D] and X, X' 
are independently sampled from X. We can rewrite the collision probability as 

Yy{xf = Pr[Z = Z']-Pr[h{X,Z) =h{X',Z')\ Z = Z'] 
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(b) 

< 


i - (Pr|X = X'l + ^ ^ Pr[/.z(i) = fe(i')]) 

x,x'&upp{X) 
x^x' 


1,1 1 1 

Jj ' ^J{ 1^ 2 ^ ~r) 


K2 ^ W 

x,x' S,supp{X) 
x^x' 


< 


DR 


/ 


where (b) uses the assumption that "H is a universal hash family. Plugging the bound in (34) 
implies that 

„ 1 R 

V " |supp(/z)| ^ K' 

Observe that both y and ^ assign zero probabilities to elements of [D] x F 2 outside the support 
of Thus using Cauchy-Schwarz on a domain of size |supp(/i)|, the above bound implies that the 
statistical distance between y and /r is at most 


^lir-Pll.<i 


|5Upp(^)| 

DR 



DR 

supp(/r)| 



(35) 


Now, we specialize /r to any distribution that is uniformly supported on a set of size DK containing 
supp(T) (note that, since X is assumed to be uniformly distributed on its support, y must have a 
support of size at most DK). Since r > k + 21og(l/e), we have K = e^R, and (35) implies that 
and /r are e-close (in fact, (e/2)-close) in statistical distance. 
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